Basic Python for Economic Analysis

Inter-American Development Bank Workshop

Diego A. Guerrero

2025-11-23

Learning Objectives

  • Familiarize with the basic concepts of Python and R
  • Identify and manipulate data structures
  • Run and execute Python / R scripts
  • Perform basic data uploading and cleaning
  • Compute summary statistics and visualize simple data

Introduction to Python

Why?

  • Python is useful due to the speed, reproducibility , flexibility, and an ecosystem of libraries

  • Ontegrates with R, SQL, Excel, Sata, and APIs

  • Scale: working with a small dataset? Good. Working with Big Data in the cloud? Perfect

  • Free

Introduction to Python

  • Install Anaconda (includes Python, Jupyter, Pandas, NumPy, etc.)
  • Create an environment
  • Install libraries in the environment
  • Launch Jupyter Notebook or JupyterLab

Install Anaconda

Why: Anaconda bundles Python, conda (env manager), Jupyter, and many data science packages.

Steps

  1. Go to the Anaconda download page (choose Python 3.x installer) and download for your OS.

  2. Run the installer and follow prompts (accept default options is fine for most users).

Terminal

Open a terminal (Windows) by looking for the Anaconda Prompt terminal.

Verify installation

Open a terminal / Anaconda Prompt and run:

conda --version
python --version
jupyter --version

Create conda environment

Why: Keeps project dependencies isolated (safer for reproducibility and error control).

Steps

  1. Go to the Anaconda Prompt and type:
conda create -n econ -y
  1. Activate the environment
conda activate econ

Libraries

Install required packages/libraries to run your program

Why: Environments come clean and you choose the dependencies and libraries that will be run on that project.

conda activate econ
conda install pandas -y 
conda install numpy matplotlib seaborn jupyterlab openpyxl statsmodels scikit-learn -y

Libraries (pip)

Some packages are unavailable in the anaconda network. You install them with pip:

conda activate econ
pip install scikit-learn -y

Tip: Prefer conda install (faster, fewer build issues); use pip only when needed.

(Other packages are far more complex. You may need to download a wheel.)

Key Libraries

Ready to Launch!

We will launch python from the Anaconda prompt.

The standard interface is jupyter lab/notebook or jupyter

Alternatives: spyder (Scientific Python Development Environment), vscode, PyCharm, Google Colab…

conda activate econ
jupyter lab        # modern interface
# or
jupyter notebook   # classic notebook

Jupyter

First python program

Create a Notebook using the python kernel. We are ready to run our first program:

print("Hello world")

print(2+5)
Hello world
7

Python syntax

Variables, Types

What are variables?

Variables store information for later use.

country = "Bahamas"
gdp = 12_500
growth_rate = 0.032
island = True
print(country, gdp)
print(growth_rate, island)
Bahamas 12500
0.032 True

Warning: Variables are stored in your RAM

Data Types

a = 10 # integer
b = 3.14 # float
c = "Economics" # string
d = True # boolean
print(type(a), type(b), type(c), type(d))
<class 'int'> <class 'float'> <class 'str'> <class 'bool'>

Common types: - int: whole numbers - float: decimals - str: text - boolean: True/False

Printing variables as string

country = "Bahamas"
gdp = 12_500
growth_rate = 0.032
island = True
print(f"The country {country} has a GDP of {gdp} \n and a growth rate of {growth_rate*100}\%.")
The country Bahamas has a GDP of 12500 
 and a growth rate of 3.2\%.

Operations

gdp = 12_500
growth_rate = 0.032
gdp_2 = gdp*(1+growth_rate)

print(f"Bahamas grew {growth_rate*100}\% and now has a GDP of {gdp_2}")
Bahamas grew 3.2\% and now has a GDP of 12900.0
# It is a good practice to comment the code.
# We first define the variables
gdp_1 = 12_500
gdp_2 = 12_900

# And we can now calculate the growth rate:
growth = gdp_2/gdp_1 - 1

print(f"The growth rate was {growth*100}\%.")
# Or round up when printing
print( f"The growth rate was {round(growth*100,2)}\%.")
# Or, first define the rounded growth
growth = round(growth,2)
print( f"The growth rate was {growth*100}\%.")
The growth rate was 3.200000000000003\%.
The growth rate was 3.2\%.
The growth rate was 3.0\%.

Conditional statements

Python includes relation operators (>, <, >=, !=, ==, in, is, not, ~) that return booleans (True/False)

print(5+15 > 10)
print(5 + 15 < 10 )
print(2 >= 5)
print(2 == 4/2)
print("a" != "a")
True
False
False
True
False

We can use if, elif and else with the operators above.

x = "abc"
if type(x) == str : 
  print(f"x({x}) is a string")
else : 
  print(f"{x} is {type(x)}")
x(abc) is a string

Conditional statement example

#score = float(input("Enter your exam score: "))
score = 50

if score >= 90:
    print("Grade: A")
elif score >= 80:
    print("Grade: B")
elif score >= 70:
    print("Grade: C")
elif score >= 60:
    print("Grade: D")
else:
    print("Grade: F")

print("Program finished.")
Grade: F
Program finished.

Data structures (collections)

Data structures organize and store multiple values efficiently.
They are different from simple variables that hold only one value.

Structure Example Key Features
List [1, 2, 3] Ordered, mutable
Tuple (1, 2, 3) Ordered, immutable
Dictionary {"name": "Ana", "age": 30} Key–value pairs
Set {1, 2, 3} Unordered, unique elements

Loops

  • for: iterate over a sequence (list, tuple, dictionary, string, range).
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)
apple
banana
cherry
  • while: Repeats as long as a condition is true.
count = 0
while count < 5:
    print("Count is:", count)
    count += 1
Count is: 0
Count is: 1
Count is: 2
Count is: 3
Count is: 4

Some useful statements: break (stops the loop), continue (skips code on that iteration), pass (does nothing/placeholder)

Loops and Conditionals

for num in range(10):
    if num == 8:
        break  # Stop the loop completely
    elif num == 3:
        continue  # Skip number 3
    print(f"Number: {num}")
Number: 0
Number: 1
Number: 2
Number: 4
Number: 5
Number: 6
Number: 7

Functions

A function is a reusable block of code that performs a specific task.
It helps you avoid repetition and organize your program.

def greet(name):
    """This function greets the person passed in as a parameter."""
    print("Hello,", name)

greet("Diego")
Hello, Diego

return sends a value back from the function to the place where the function was called.

Function uses

Suppose that you want to run a model and save the results in a table, then do a different model and a table. Functions save space.

def make_table(values):
    """Return a list of (index, value) pairs as a simple table."""
    table = []
    for i, v in enumerate(values):
        table.append((i, v))
    return table

data = [200, -15, 100]
result = make_table(data)

print("Index | Value")
print("--------------")
for row in result:
    print(f"{row[0]:5} | {row[1]:6}")
Index | Value
--------------
    0 |    200
    1 |    -15
    2 |    100

Dataframes

Table-like structure (pandas library) with rows/columns

import pandas as pd
data = {
    "Country": ["Brazil", "Bahamas", "Mexico"],
    "GDP": [2.1, 0.015, 1.8],
    "Population": [214, 0.401, 130]
}
df = pd.DataFrame(data)
df
Country GDP Population
0 Brazil 2.100 214.000
1 Bahamas 0.015 0.401
2 Mexico 1.800 130.000

Data in python

Importing

import pandas as pd
brent = pd.read_csv("session_1_files/eia_brent.csv")
# Quick preview
brent.head()
date brent price
0 1987-05-20 18.63
1 1987-05-21 18.45
2 1987-05-22 18.55
3 1987-05-25 18.60
4 1987-05-26 18.63

Importing various formats

Pandas supports many ways to import data.

# Excel file
pd.read_excel("session_1_files/data.xlsx")
# Text file
pd.read_table("session_1_files/data.txt")

Exporting

Saving data is easy.

import pandas as pd
data = {
    "Country": ["Brazil", "Bahamas", "Mexico"],
    "GDP": [2.1, 0.015, 1.8],
    "Population": [214, 0.401, 130]
}
df = pd.DataFrame(data)

df.to_csv("session_1_files/example.csv", index=False)
df.to_csv("session_1_files/example.txt", sep="\t", index=False)
df.to_excel("session_1_files/example.xlsx", index=False)
df.to_json("session_1_files/example.json", orient="records")

json is a very efficient format in data science that stores records as dictionaries

Data Formats

Format Structure Best for Advantages Limitations
CSV / TXT Flat (rows & columns) Tabular data, spreadsheets Simple and lightweight
Works everywhere
Easy to inspect
No nested data
No metadata
Loses data types
Excel (.xlsx) Flat (cells, sheets) Business / reports Formatting, formulas
Multiple sheets
Familiar to most users
Slow
Not ideal for automation
JSON Hierarchical (nested) Web data, APIs, configs Stores complex/nested data
Language-independent
APIs
Larger files
Harder to view in Excel
Parquet / Feather Columnar (binary) Big data, analytics Very fast I/O
Keeps data types
Compressed & efficient
Not human-readable
Requires Python/R

Data exploration

import pandas as pd
df = pd.read_csv("session_1_files/uscensus_trade.csv")

# Show first rows
df.head()

# Show last rows
df.tail()

Data exploration

date exports USA imports USA
476 2024-09-30 36.066283 368.731802
477 2024-10-31 109.539796 305.212492
478 2024-11-30 219.298712 636.946114
479 2024-12-31 366.392066 520.629852
480 2025-01-31 65.009249 506.231056

DataFrame Summary

import pandas as pd
df = pd.read_csv("session_1_files/uscensus_trade.csv")

print("====================")
print("Data info:")
print(df.info()) # Column names, data types, missing values

print("====================")
print("Summary stats:")
print(df.describe()) # Summary stats for numeric columns

print("====================")
print("Size:")
print(df.shape) # (rows, columns)

print("====================")
print("Column/Variable names:")
print(df.columns) # List of column names

print("====================")
print("Data types:")
print(df.dtypes) # Data types of each column

DataFrame: info()

import pandas as pd
df = pd.read_csv("session_1_files/uscensus_trade.csv")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 481 entries, 0 to 480
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         481 non-null    object 
 1   exports USA  481 non-null    float64
 2   imports USA  481 non-null    float64
dtypes: float64(2), object(1)
memory usage: 11.4+ KB

DataFrame: describe()

import pandas as pd
df = pd.read_csv("session_1_files/uscensus_trade.csv")
df.describe()
exports USA imports USA
count 481.000000 481.000000
mean 45.783287 170.553382
std 44.591570 128.572069
min 5.500000 43.200000
25% 23.500000 65.200000
50% 34.800000 125.860780
75% 52.100000 243.605530
max 381.395095 850.473691

Selecting

Suppose your dataset has many columns and you want to see descriptive statistics of only one.

import pandas as pd
df = pd.read_csv("session_1_files/uscensus_trade.csv")
df['exports USA'].describe()
count    481.000000
mean      45.783287
std       44.591570
min        5.500000
25%       23.500000
50%       34.800000
75%       52.100000
max      381.395095
Name: exports USA, dtype: float64

Indexes

When dealing with time-series it is a good idea to state the preferred index

import pandas as pd
df = pd.read_csv("session_1_files/uscensus_trade.csv")
df.head()
date exports USA imports USA
0 1985-01-31 154.4 59.0
1 1985-02-28 73.7 43.2
2 1985-03-31 47.7 43.4
3 1985-04-30 25.5 57.9
4 1985-05-31 38.3 68.0

Set Index

When dealing with time-series it is a good idea to state the preferred index

import pandas as pd
df = pd.read_csv(
    "session_1_files/uscensus_trade.csv", 
    index_col='date'
    )
df.head()
exports USA imports USA
date
1985-01-31 154.4 59.0
1985-02-28 73.7 43.2
1985-03-31 47.7 43.4
1985-04-30 25.5 57.9
1985-05-31 38.3 68.0

Manage dates

Sometimes we need to make sure a column is read as a date. Use pd.to_datetime()

import pandas as pd
df = pd.read_csv(
    "session_1_files/uscensus_trade.csv"
    )

print(f"The column type: {df['date'].dtype}")
df.date = pd.to_datetime(df['date'], format="%Y-%m-%d")
print(f"After transforming: {df['date'].dtype}")
The column type: object
After transforming: datetime64[ns]

Useful when merging/standardizing/visualizing a dataset.

Simple visualization

import pandas as pd
df = pd.read_csv( 
    "session_1_files/uscensus_trade.csv",
     index_col='date',
     parse_dates=True
     )
df.plot()

Filtering

import pandas as pd
df = pd.read_csv( 
    "session_1_files/uscensus_trade.csv",
     index_col='date',
     parse_dates=True
     )
df[df.index>"2020-01-01"].plot()

Complex Filtering

import pandas as pd
df = pd.read_csv( 
    "session_1_files/uscensus_trade.csv",
     index_col='date',
     parse_dates=True
     )
df[ (df.index>"2020-01-01") & (df.index<"2024-01-01")].plot()

Also df.loc["2020-01-01":"2025-01-01"].plot()

Combining datasets: concat, merge, join

Method Typical Use Based On Example Notes
pd.concat() Stack datasets vertically (rows) or horizontally (columns) Axis (0 = rows, 1 = cols) pd.concat([df1, df2]) Simple append; doesn’t match keys
pd.merge() Combine on one or more columns (like SQL JOIN) Key columns pd.merge(df1, df2, on="id") Most flexible; supports inner, left, right, outer joins
df.join() Combine on index (row labels) Index alignment df1.join(df2) Convenient when indices are meaningful

concat

id value score
0 1 A NaN
1 2 B NaN
2 3 C NaN
0 2 NaN 80.0
1 3 NaN 90.0
2 4 NaN 70.0

merge

value score
id
1 A NaN
2 B 80.0
3 C 90.0

join

value score
id
1 A NaN
2 B 80.0
3 C 90.0

Reshaping

wide → long: melt()

import pandas as pd
df = pd.read_csv( 
    "session_1_files/uscensus_trade.csv",
     parse_dates=True
     )

long_df = pd.melt(df, id_vars='date', 
                  var_name="Type", value_name="Value")
long_df
date Type Value
0 1985-01-31 exports USA 154.400000
1 1985-02-28 exports USA 73.700000
2 1985-03-31 exports USA 47.700000
3 1985-04-30 exports USA 25.500000
4 1985-05-31 exports USA 38.300000
... ... ... ...
957 2024-09-30 imports USA 368.731802
958 2024-10-31 imports USA 305.212492
959 2024-11-30 imports USA 636.946114
960 2024-12-31 imports USA 520.629852
961 2025-01-31 imports USA 506.231056

962 rows × 3 columns

Reshaping: long → wide: pivot()

import pandas as pd
df = pd.read_csv( 
    "session_1_files/uscensus_trade.csv",
     parse_dates=True
     )

long_df = pd.melt(df, id_vars='date', 
                  var_name="Type", value_name="Value")

wide_df = long_df.pivot(index="date", columns="Type", values="Value")
wide_df
Type exports USA imports USA
date
1985-01-31 154.400000 59.000000
1985-02-28 73.700000 43.200000
1985-03-31 47.700000 43.400000
1985-04-30 25.500000 57.900000
1985-05-31 38.300000 68.000000
... ... ...
2024-09-30 36.066283 368.731802
2024-10-31 109.539796 305.212492
2024-11-30 219.298712 636.946114
2024-12-31 366.392066 520.629852
2025-01-31 65.009249 506.231056

481 rows × 2 columns

Reshape: resample()

Transforms data into a different frequency (e.g. daily to monthly/quarterly)

import pandas as pd
import matplotlib.pyplot as plt

# Create a daily dataset
df = pd.read_csv(
    "session_1_files/eia_brent.csv",
    parse_dates= True,
    index_col = "date")

monthly_df = df.resample("M").mean()

plt.figure(figsize=(8,4))
plt.plot(df.index, df["brent price"], alpha=0.4, label="Daily")
plt.plot(monthly_df.index, monthly_df["brent price"], color="red", linewidth=2, label="Monthly")
plt.title("Daily vs. Monthly Brent price", fontsize=14, weight="bold")
plt.legend(frameon=False)
plt.show()

Resample: example

import pandas as pd
import matplotlib.pyplot as plt

# Create a daily dataset
df = pd.read_csv(
    "session_1_files/eia_brent.csv",
    parse_dates= True,
    index_col = "date")

monthly_df = df.resample("M").mean()

plt.figure(figsize=(8,4))
plt.plot(df.index, df["brent price"], alpha=0.4, label="Daily")
plt.plot(monthly_df.index, monthly_df["brent price"], color="red", linewidth=2, label="Monthly")
plt.title("Daily vs. Monthly Brent price", fontsize=14, weight="bold")
plt.legend(frameon=False)
plt.show()

Resample Frequencies

Code Frequency Example Dates
“D” Daily 2020-01-01, 2020-01-02
“W” Week-End 2020-01-05, 2020-01-12
“M” Month-End 2020-01-31, 2020-02-29
“MS” Month-Start 2020-01-01, 2020-02-01
“Q” Quarter-End 2020-03-31, 2020-06-30
“QS” Quarter-Start 2020-01-01, 2020-04-01
“A” Year-End 2020-12-31, 2021-12-31
“AS” Year-Start 2020-01-01, 2021-01-01

Aggregation functions

Function Description Example
.mean() Average over the period df.resample("M").mean()
.sum() Total over the period df.resample("M").sum()
.last() Last observation df.resample("M").last()
.first() First observation df.resample("M").first()
.max() Maximum value df.resample("M").max()
.min() Minimum value df.resample("M").min()
.median() Median value df.resample("M").median()
.agg(["mean","max"]) Multiple aggregations df.resample("Q").agg(["mean","max"])

Activity

Loading and visualizing a dataset

  • Load two datasets (GDP and stock index)
  • Perform data cleaning (e.g., standardizing date formats, merging, reshape)
  • Reshape/resample/merge as needed
  • Calculate basic descriptive statistics
  • Generate time series plot

Appendix - Visualization: a quick look

Line Plot

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
date_range = pd.date_range("2015-01-01", periods=60, freq="Q")
gdp = 1000 + np.cumsum(np.random.normal(5, 10, len(date_range)))
inflation = np.random.uniform(1.5, 4.0, len(date_range))
df = pd.DataFrame({"date": date_range, "GDP": gdp, "Inflation": inflation})

plt.figure(figsize=(8,5), facecolor="white")
gdp_color = "#1f77b4"        # a strong blue
inflation_color = "#d62728"  # a contrasting red

plt.plot(df["date"], df["GDP"], label="GDP (in billions)", linewidth=2.5, color=gdp_color)
plt.plot(df["date"], df["Inflation"]*250, label="Inflation (scaled)", linestyle="--", color=inflation_color)

plt.title("GDP and Inflation Over Time", fontsize=16, weight="bold")
plt.xlabel("Date")
plt.ylabel("Index / Scaled Value")
plt.legend(frameon=False)
plt.grid(alpha=0.3)
plt.gca().set_facecolor("white")   
plt.show()

Bar chart

sectors = ["Agriculture", "Manufacturing", "Tourism", "IT", "Finance"]
gdp_share = [5, 25, 20, 30, 20]

plt.figure(figsize=(7,4))
bars = plt.bar(sectors, gdp_share, color=["#1f77b4","#ff7f0e","#2ca02c","#9467bd","#d62728"])
plt.title("GDP Share by Sector", fontsize=16, weight="bold")
plt.ylabel("Percent of Total GDP")
plt.grid(axis="y", alpha=0.3)
plt.show()

Box Plot

import seaborn as sns
np.random.seed(42)
data = pd.DataFrame({
    "Sector": np.repeat(["Agriculture","Tourism","Finance","IT"], 50),
    "Wage": np.concatenate([
        np.random.normal(800, 100, 50),
        np.random.normal(1200, 150, 50),
        np.random.normal(2500, 300, 50),
        np.random.normal(2200, 250, 50)
    ])
})

plt.figure(figsize=(8,5))
sns.boxplot(data=data, x="Sector", y="Wage", palette="Set2")
plt.title("Wage Distribution by Sector", fontsize=16, weight="bold")
plt.ylabel("Monthly Wage (USD)")
plt.grid(axis="y", alpha=0.3)
plt.show()

Scatter plot

gdp = np.random.uniform(1000, 4000, 40)
employment = gdp*0.04 + np.random.normal(0, 50, 40)

plt.figure(figsize=(7,5))
plt.scatter(gdp, employment, s=80, c="#1f77b4", alpha=0.7, edgecolors="white", linewidths=1)
plt.title("GDP vs Employment", fontsize=16, weight="bold")
plt.xlabel("GDP (in millions)")
plt.ylabel("Employment (in thousands)")
plt.grid(alpha=0.3)
plt.show()

Interactive Plotting with Plotly

import plotly.express as px
import pandas as pd

# Load the dataset
df = px.data.gapminder().query("year == 2007")

# Create an interactive scatter plot
fig = px.scatter(df, x="gdpPercap", y="lifeExp", 
                 size="pop", color="continent",
                 hover_name="country", log_x=True,
                 size_max=60)
fig

Seaborn Advanced Visualization

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8')

plt.style.use('seaborn-v0_8')
np.random.seed(42)

# --- Create a fake dataset ---
sectors = ["Agriculture", "Tourism", "Finance", "IT"]
data = pd.DataFrame({
    "Sector": np.repeat(sectors, 80),
    "Wage": np.concatenate([
        np.random.normal(800, 100, 80),
        np.random.normal(1200, 150, 80),
        np.random.normal(2500, 300, 80),
        np.random.normal(2200, 250, 80)
    ]),
    "Hours": np.concatenate([
        np.random.normal(45, 5, 80),
        np.random.normal(42, 4, 80),
        np.random.normal(38, 3, 80),
        np.random.normal(40, 4, 80)
    ])
})

# --- Create figure with subplots ---
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot 1: Violin plot for Wage distribution
sns.violinplot(data=data, x="Sector", y="Wage", ax=ax1, palette="Set2", inner="box")
ax1.set_title("Wage Distribution by Sector", fontsize=14, weight="bold")
ax1.set_xlabel("")
ax1.set_ylabel("Monthly Wage (USD)")
ax1.grid(alpha=0.3)

# Plot 2: Box plot for Working Hours
sns.boxplot(data=data, x="Sector", y="Hours", ax=ax2, palette="Set3")
ax2.set_title("Working Hours by Sector", fontsize=14, weight="bold")
ax2.set_xlabel("")
ax2.set_ylabel("Weekly Hours")
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.show()

Seaborn Advanced Visualization